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Summary 

This Report is based on a BBC tutorial paper describing several bit-rate-reduction 
techniques which have been studied at the BBC and which are used for the transmission of 
digital television signals. The techniques described are Sample-Rate Reduction, Differential 
Pulse Code Modulation (DPCM) coding - including a review of linear prediction theory and 
quantisation, Transform Coding (DCT), Entropy Coding and Vector Quantisation. The text 
of the tutorial was first prepared in 1986 before some of the recent advances in low bit-rate 
coding for television involving techniques such as motion compensated interframe prediction 
and subband coding. However, the material presented here continues to prove itself to be a 
useful introduction to the subject and therefore it has been reprinted in the hope that it will 
give useful insights to engineers requiring a reasonably detailed introduction to bit-rate 
reduction of television signals. 
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BIT-RATE REDUCTION OF DIGITAL TELEVISION FOR TRANSMISSION: 

AN INTRODUCTORY REVIEW 

N.D. Wells, B.A., D.Phil. 
1. INTRODUCTION 

The advantages of using digital techniques for transmission are well known. Signals coded in digital form 
are, in the main, resistant to the effects of distortion and noise introduced by a transmission link and can be 
regenerated at intervals without loss of fidelity. Accordingly, the quality of a television signal (video and audio) 
transmitted in digital form is determined primarily by the digital coding method used, rather than by the link itself. 
Digital techniques also offer greater flexibility in multiplexing and switching operations. 

For these reasons among others, telecommunications administrations around the world are changing to 
digital networks to carry telephony, television and data services. Standard hierarchical bit rates for access to such 
networks apply, and in Europe, at the present time, these hierarchical levels are approximately 64 kbit/s, 
2048 kbit/s, 8448 kbit/s 34.368 Mbit/s and 139.264 Mbit/s. (These last two bit rates are generally referred to as 
34 Mbit/s and 140 Mbit/s respectively.) 

At present* in the UK, and probably for many years, the television signal will be originated and radiated in 
composite PAL form. Thus, there will be a continuing requirement for the distribution of composite signals, 
possibly in digital form. Straightforward PCM coding of composite PAL signals generates a digital signal with a bit 
rate of over 100 Mbit/s which might prove expensive to transmit without bit-rate reduction. 

As future all-digital studios come into service there will be a need to interconnect them using a signal 
format based on the internationally agreed (CCIR Recommendation 601) 4:2:2 format for digital YUV component 
signals used within studio centres. This standard requires a sampling frequency of 13.5 MHz for the luminance 
component (Y) and 6.75 MHz for each of the chrominance components (U and V), This generates a source signal 
with a total bit rate of 216 Mbit/s. Bit-rate reduction is required if this component-based signal is to be carried 
long distances on the digital communication network. 

Several different techniques have been, and continue to be, developed for the bit-rate reduction of video 
signals. Most involve some reduction in picture quality resulting from the removal of picture information and from 
the addition of noise and distortion in regions of the picture where the eye is least sensitive to the impairments. In 
general, the final picture quality depends on the degree of bit-rate reduction and on the complexity of the coding 
and decoding algorithms employed. 

For broadcast quality applications, two separate quality objectives can be aimed at, one for the distribution 
link and the other for the contribution link. The distribution link carries the final studio output to the transmitter 
and the home, after which no processing of the signal will occur (except signal reconstruction in the home receiver). 
On the other hand a contribution link carries signals between studio centres or from outside broadcast units to 
studio centres and the quality requirements are higher than for a distribution link. This is because it is likely that 
contribution signals will be further processed for special effects which depend on high quality source signals. 

At present, distribution and contribution circuits use composite analogue signals and there is little 
distinction between the two types of link. The comparatively poor separation between the luminance and 
chrominance components in the composite signal limits the quality of any subsequent special effects processing. 
However, the new standard for digital component signals in the studio (CCIR Rec. 601), and the increasing 
availability of digital links offer the prospect that, in future, the link quality will be high enough to allow high- 
quality processing of a contribution signal. 

An obvious bit-rate-reduction target is to compress the video signal into 34 Mbit/s for point-to-point 
transmission over the digital network. Whilst this compression can be achieved for a PAL signal without noticeable 
loss of quality it has not yet* been shown that the full quality of the component standard can be maintained at this 
low bit rate. At 140 Mbit/s the full quality can be maintained, but the cost for the broadcaster of the use of such 
a high capacity link could be prohibitive. It should be possible to code the digital component signal at full quality 

Note, this manuscript was first prepared in 1986. 
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with a transmisssion bit rate of between 68 and 70 Mbit/s although, at the present time, no access at this bit rate 
is provided by the European hierarchical network. 

Many good reviews have been published covering the field of digital coding and bit-rate reduction of 
television signals (for example see Refs. 1 and 2). Nevertheless, it is a wide field and, of necessity, a review can 
only be a chart to guide the reader through the maze of a large bibliography. It is not the purpose of this Report 
to give another review covering the whole subject. Rather, it is an attempt to describe in more detail some of the 
techniques, studied at BBC Research Department, which have proved or might be likely to prove successful and 
adequate for broadcast-quality coding. The techniques covered in this Report include sample-rate reduction, 
differential coding (DPCM), transform coding, variable-length (or entropy coding) and vector quantisation. 

Previous work at Research Department has produced several experimental bit-rate-reduction systems, some 
of which have been used in field trials of digital television transmission. An early system coded one PAL video 
signal and six high-quality sound channels into a 60 Mbit/s package; this system was tested over both terrestrial 
and satellite links 3 . Next, experimental equipment was constructed to reduce the bit rate of a digital PAL signal to 
within 34 Mbit/s 4 . After careful appraisal of the BBC's requirements, it was decided to develop service PAL 
transmission equipment, operating at 68 Mbit/s, to convey one PAL signal plus a multiplex of many other 
services 5 . Two such equipments could be combined to form a standard 140 Mbit/s package for connection to the 
national network. This system was tested on a 9-month field trial over a link between BBC premises in London 
and Birmingham 6 . Experimental equipment was also constructed to form up a YUV digital component signal into 
a package at 140 Mbit/s and this equipment has also been tested on the London to Birmingham link 7 . 

2. SAMPLE RATE REDUCTION 
2.1 Review of sampling theory 

One method of reducing the bit rate of digitally coded signals is to reduce the sampling frequency in the 
encoder. According to the Nyquist sampling theorem the minimum sampling frequency that can be used, without 
introducing unwanted alias components into the decoded analogue signal, is equal to twice the highest frequency of 
the original analogue signal; this minimum frequency is often referred to as the Nyquist sampling frequency. This is 
illustrated in its simplest form in Fig. 1(a) which shows the spectrum of a video signal after sampling (assuming 
that the samples are produced as narrow impulses). Provided that the bandwidth of the original signal is limited, 
such that / ma x < / s /2, then the signal can be recovered after sampling without distortion, by low-pass filtering 
(post-filtering). 
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The process of sampling is equivalent to multiplying the analogue signal, g(t), by a series of impulse or 
delta functions. The sampled signal, s(t), is then given by: 



tt = oo fl=oo 



s(t) = g(t) X 8(*-nT)= % g(nT)8(t-nT) (2.1) 

n—— oo n—— oo 

where T is the sampling interval. The spectrum of a series of delta functions is itself a series of delta functions in 
the frequency domain separated in frequency by an amount \/T (see for example Ref. 8). The spectrum, S(co), of 
s(t) is then obtained by convolving G(g>) with the delta function spectrum. S(oj) is then as shown in Fig. 1(a) and 
is given by 

• n=oo 

S(to) = — J G ( tw ~ noi o) where a> = 2-ir/T (2.2) 

n=—ao 

The process of sampling adds energy to the original signal, as each sampling delta function itself contains 
infinite energy. In practice however, sampled signals are of a non-return to zero (NRZ) form. This NRZ signal can 
be obtained by convolving the impulse sampled signal s(t) with a function, c(t), where 

c{t) = 1 -772 < t < T/2 

= \t\ > T/2 

The Fourier transform, C(<w), of c{t) is given by 

„ , rsin(wr/2) 

The spectrum of the NRZ signal is then as shown in Fig. 1(b) and is given by 

C(ai)S(a>) - > G(co - no) ) 

(vT/2) n ±*_^ 

Often it is profitable to consider the video spectrum as a three-dimensional (3-D) function, and sampling as 
a 3-D process. Thus, the scene to be scanned is considered as a function, g(x t y,t), of x andy coordinates and time, 
and having a 3-D spectrum, G(w x ,w y ,w<). The scanning device samples the scene in the vertical, Y, direction and in 
time. Sampling in the horizontal direction occurs during the process of analogue-to-digital conversion. The 
sampling causes the spectrum to be repeated at harmonics of the 3-D sampling frequencies. For example, line- 
locked sampling, combined with interlaced scanning, gives the sampling structure shown in Fig. 2(a) and 2(b). 
Figs. 2(c) and 2(d) respectively give the vertical/horizontal and the vertical/temporal projections of this sampling 
pattern in the frequency domain. 

It can be seen that, just as in the one-dimensional case, if the 3-D bandwidth of the original signal is 
limited appropriately by suitable low-pass filtering in three dimensions, then the original signal can be recovered 
after sampling without distortion. 

For component coded signals, there are many possible ways of choosing 3-D sampling patterns. An 
optimum sampling pattern minimises the overall sample rate while requiring that the signal bandwidth is restricted 
only in regions where the loss of bandwidth is least noticeable. 

For any but the simplest one-dimensional sample rate reduction, the filtering is best performed digitally. 
The process of digital filtering, combined with sample-rate reduction, may be understood by reference to Fig. 3. A 
signal, g(t), is sampled at a frequency /i and filtered by a low-pass filter with an impulse response, h{t). The 
output of this filter is resampled at f 2 and the final sample values will be a weighted sum of the /i impulse values. 
The coefficients in this summation depend on the impulse response, h(t), and also on the position of an individual 
output sample in relation to the input samples. This process of downsampling may be seen, therefore, as one of 
interpolation 9 . Similarly, Fig. 3 describes the process of post-filtering and sample-rate up-conversion if / 2 > /i. 
The implementation of these interpolating filters is simplest when the ratio of/] to/ 2 is 2:1 or 1:2. 
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(a) Two-field sampling structure in horizontal/vertical plane. 
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(b) Sampling structure in vertical/temporal plane. 
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(d) Repeat spectra in toy, a>t domain. 
Fig. 2 - Examples of sampling structures and repeat spectra. 
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Ffe 3 - Illustration of process of sample-rate changing. 

When choosing a suitable reduced-rate sampling grid for a particular application, it is necessary to 
determine the centres of the repeat spectra in the frequency domain. The theory is given in the following 
sub-section. 
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2.2 Determination of repeat spectra in the frequency domain 

For many sampling structures, the sampling grid can be described by the points of intersection of three sets 
of planes. If the distances separating the planes measured along the x, y and / axes are Xi, Yi and Ti respectively 
for the rth set of planes, then the centres of the repeat spectra in the frequency domain are described by vectors 
given by: 

OO OO 00 

XXX ^ iVi + kiNi + k * y ^ 

&l=-°° fc2=-<» £3=-°° 

where V, is a vector with projections 1/Xt, \/Y h 1/Ti on the/* f y ,ft axes respectively, and k u k :2 and fc 3 are 
integers 10,11,12 . 

For more complicated sampling grids, the total sampling pattern can be considered as a superposition of 
identical orthogonal grids which are shifted relative to one another. If the basic orthogonal grid of sampling 
impulses d(x,y,t) is defined by 

d{xj,t) = X X X S(x-jX,y-kY,t-lT) 

j k I 

where;',/:,/ are integers, the spectrum D(fx, /?,/,) of this grid is given by 

D(f*,fy,fd = r==, X X X 8(f*-p/X,f,-q/Y,f,-r/T) 

A second grid shifted by a x X, a y Y, a t T, where a x> a y , a h are less than unity, has a Fourier transform D(f x ,f y> f h a) 
given by 

D(f x J y ,f t ,a) = D(f x ,f y J t ) exp [ -j2w{a x f x + a y f y + a t f<) ] 

Then, for a sampling pattern described by the superposition of N grids defined by a ( (where ao = 0), the Fourier 
transform is 



r N-l 

*></»/» fd [i+ X «p(-yf-»0 



N-l 
The magnitudes of the spectral impulses at f = (p/X,q/Y,r/T) are therefore modified by the function 



N-l 
1+ X exp(~yf.a,) 
i=\ 

which has zeros for certain values of p,q,r. The final sampled spectrum is then given by the convolution of this 
Fourier transform with the baseband signal spectrum. 

2.3 Sample-rate reduction of the colour difference components 

The colour difference signal bandwidth carried by digital signals coded in accordance with CCIR Rec. 601 
is determined by the possible requirement for special-effects processing of the signal within the studio. The 
bandwidth can, however, be reduced further by a factor of about two in both horizontal and vertical directions 
before any subjective impairment is perceptible on typical pictures. It is very common therefore, in bit-rate- 
reduction systems, for the colour difference sampling frequencies and bandwidths to be reduced from those of the 
source standard. 

2.3.1 Reduction of vertical sampling rate 

A common method for halving the sampling rate of the colour difference component is to reduce the 
vertical sampling frequency for both components by transmitting only U or V on each line. Provided that each 
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Fig. 4-2:1 vertical sampling. 



component is vertically low-pass filtered before sub-sampling then the two-dimensional (2-D) spectrum will be as 
shown in Fig. 4. The baseband spectrum can be recovered, without aliasing distortion, by intrafield interpolation to 
reconstruct the colour components on all field lines. 

If the subsampling is as shown in Fig. 5(a) and is not 'reset' at the start of each frame, then the spectral 
repeats in the vertical /temporal frequency plane are as shown in Fig. 5(b). Any residual alias components 
appearing in the baseband spectrum have a frequency of 12.5 Hz (for stationary detail). The most noticeable effect 
caused by this aliasing is that it gives rise to 12.5 Hz flicker on large amplitude horizontal chrominance transitions. 
However, in a 625-line system and with good pre-and post-vertical filtering, this method of colour difference 
sample-rate reduction gives rise to only a very slight loss in subjective picture quality on most pictures. The loss of 
vertical resolution is only visible at horizontal transitions where a boundary is defined by a colour difference 
change which is not accompanied by a luminance change. The resolution loss is also visible on some graphical 
material such as horizontal colour bars. 

If a 2:1 subsampling is achieved by a line-offset or 'line-quincunx' sampling pattern as shown in Fig. 6(a) 
then the centres of the repeat spectra will be as shown in Fig. 6(b). This figure also shows a possible 2-D 
boundary shape for a baseband spectrum that this line-quincunx sampling can support. Other shapes are possible 
and some different examples are given in following sections. The area of the diamond shape in Fig. 6(b) is the 
same as the area of the rectangular shape shown in Fig. 4(b). However, it is probably the minimum value of the 
bandwidth in any direction which determines perceived image quality and not the area contained within the 
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(a) 2:1 line subsampling structure for U chrominance component. (b) Spectrum of subsampled U chrominance. 

Fig. 5 - Examples of line-subsampled chrominance. 
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Fig. 6 - 2:1 subsampling using a line-quincunx structure. 

bandwidth boundary. For line-quincunx sampling as shown, the minimum value of the bandwidth is along the 
diagonal, and the maximum bandwidth, B, in this direction is given by 



B 



[(1/2Z) 2 + (1/4F) 2 ] 1/2 



For example, with 625-line interlaced scanning and 2:1 vertical subsampling and a horizontal sampling frequency 
of 6.75 MHz, the maximum horizontal bandwidth is 3.375 MHz and the maximum vertical bandwidth is 78 cycles 
per picture height (c.p.h.), which is equivalent to approximately 1.7 MHz in the horizontal direction. With line- 
quincunx sampling, the maximum value of the vertical bandwidth increases to 3.4 MHz (horizontal equivalent) 
and the equivalent value of the bandwidth in the diagonal direction is 2.4 MHz. On critical pictures, this increase 
in minimum bandwidth gives a perceptible improvement in quality. This improvement is obtained of course at the 
expense of an increase in complexity of the bandwidth-defining filters. 

2.3.2 Reduction of temporal sampling rate 

Another technique that has been used for reducing the sampling rate of the colour difference components is 
to halve the temporal sampling frequency; i.e. the colour information is transmitted for only one field per frame. 
This gives repeat spectra for each component as in Fig. 7. In this case, the maximum temporal bandwidth that can 
be supported is one quarter of the field frequency. 
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(a) Temporal subsampling pattern. (b) Vertical/temporal spectrum for temporal subsampling. 

Fig. 7 - Spectrum for temporally subsampled chrominance. 
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Reducing the temporal bandwidth has the effect of reducing the spatial bandwidth for moving images. For 
a given speed of motion, the spatial bandwidth in the direction of motion may be calculated as follows: 

Consider a vertical grating defined by oos(2irf x x) moving with a velocity, h, in the horizontal direction. Then, at a 
given point in the picture, the magnitude of the temporal frequency,/, is 

ft = uf x 

For a maximum temporal bandwidth, B,, the maximum spatial bandwidth, B x , for a given speed of motion is then 

B x = B t /u 

IfBt is equal to 12.5 Hz, then the variation of B x with u is given by the full curve in Fig. 8. A convenient unit for 
measuring speed of motion is 'picture widths per second'. B x is then in units of cycles per picture width (c.p.w.). 
Considering displayed picture width (rather than total line length), 1 cp.w. corresponds to aproximately 0.02 MHz. 

Since the temporal subsampling and interpolating filters are constructed around field delays, practical filters 
will contain a minimum number of these large delays. Therefore, it is probable that the 6 dB temporal bandwidth 
will be close to 6.25 Hz rather than 12.5 Hz giving a spatial bandwidth as a function of speed as shown by the 
dotted curve in Fig. 8. It can be seen that at movements as slow as 0.1 picture widths per second, the spatial 
bandwidth is reduced to 1 MHz. Temporal subsampling of the colour difference component gives, therefore, a 
perceptible loss of resolution on critical pictures containing movement. It should be noted that if this same form of 
straightforward temporal subsampling were to be applied to the luminance component then the resolution loss with 
motion would be very apparent. 
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2.4 Sample rale reduction of the luminance component 

The sample rate for the luminance component, according to CCIR Rec. 601, is set at 13.5 MHz. Except 
for the highest quality requirements, the maximum bandwidth required for the luminance component is 
approximately 5 MHz (in any direction). Then with line-locked sampling, a sensible reduced-rate (3/4) sampling 
frequency is 10.125 MHz. However, good quality pictures can be obtained using 9.0 MHz sampling in a line- 
quincunx pattern, giving a supportable bandwidth shape of the form shown in Fig. 9. This bandwidth shape is 
optimum because it maximises the minimum bandwidth. The minimum bandwidth is in the diagonal direction and 
has a value which is equivalent to approximately 4.8 MHz. The maximum horizontal bandwidth is about 
5.3 MHz. 

A finite impulse response (FIR) filter having a 2-D response similar in shape to the optimum shown in 
Fig. 9 can be constructed but this requires the addition of contributions from a large number of sample points 
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Fig, 9 - Optimum 2-D bandwidth for line-quincunx luminance sampling at 9 MHz. 

taken over several lines. Alternatively, a simpler form of 2-D filter, known as a 'comb filter', can be used. Block 
diagrams of comb filters suitable for pre- and post-filtering are shown in Fig. 10. The length of the delay element, 
T, in the filter is usually equal to one line period, but delays equal to a field period (313 lines) or a picture period 
are also possible 14 . The shape of the frequency response for a pair of line-delay-based comb filters is shown in 
Fig. 11 where the shape corresponds to a 6 dB contour. 
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Fig, 10 - Examples of comb filters. 
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Comb filters are particularly suited to the sample rate reduction of PAL signals, provided that the reduced 
sample rate has a frequency of exactly twice the colour subcarrier frequency*' 5,13 ' 14 

3. DIFFERENTIAL PULSE CODE MODULATION (DPCM) CODING 

A very successful technique for reducing the number of bits per sample is DPCM coding. In DPCM 
coding, a prediction of the signal is formed from surrounding sample values and the difference between the input 
and the prediction is quantised and transmitted, rather than the signal itself. The difference quantiser is nonlinear, 
coding small differences accurately and large differences, which occur less frequently, with less accuracy. Large 
differences or prediction errors tend to occur at edges or in detailed areas of the picture where the eye is less 
sensitive to quantisation distortion. The noise or distortion introduced by DPCM coding tends to be masked, 
therefore, because it is typically confined to a small fraction of the total picture area and also confined to those 
areas in which any distortion can be masked by the picture detail. 

At the DPCM decoder, the quantised prediction error is added to the corresponding prediction value to 
form the coded ouput signal. In order that both coder and decoder are working with the same prediction value, it 
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is necessary that the coder, in forming the prediction, uses only those same output sample values that are available 
to the decoder. A DPCM coder must, therefore, also contain within it a decoder; block diagrams of the coder and 
the decoder are shown in Fig. 12. 

The prediction is formed by a sum of contributions from previous (output) sample values. The simplest 
prediction of the current sample is the previous (output) sample value. More complicated predictors include 
contributions from samples on the previous line, field or frame. When the number of coding levels is limited, the 
distortion in the reconstructed picture is minimised by 'optimising' the prediction, thereby minimising the power in 
the prediction-error signal. A technique for designing optimum predictors using picture statistics is described in the 
next sub-section. This method has been shown to give very good results for the coding of both composite PAL and 
component YUV signals 4,5,7 and also gives a transmission system which is robust in the presence of transmission 
errors. Prediction algorithms which adapt according to picture content might, in future, prove optimum at low bit 
rates. However, DPCM systems using non-adaptive predictors, consisting of contributions from previous elements 
from lines in the current field and lines in previous fields, have proved to be both practicable and give good 
performance for a wide range of picture material. The non-adaptive approach is discussed further below. 

3.1 Linear prediction 

The prediction is formed as a linear sum of previous sample values. If x[n], p[n] t y[n] represent the input, 
prediction, and output sample sequences respectively (as shown on Fig. 12) then: 

N N 

p[n] = J a * y[n-k\ — ^ a k x[n-k] 
k=l Jfc=i 

where N is the number of samples used in the prediction. 

In order to design a predictor based on the input samples, x[n] , we have assumed j/[n] = x[n] which is the 
case for high-quality coding. The prediction error, e[n], given by 

e[n] = x[n] ~p[ri] 

has a mean square value, averaged over a picture, given by 

N 
<e 2 [n]> = <(x[n]~ J a k x[n-k]) 2 > 

where < > represents the average value. The mean square prediction error, as a function of the predictor 
coefficients, has a minimum when 

d , d 

— < e\ri\ > = 2 < e[n] — e[n] > 

octj daj 

= for y= 1,2,.... ^V 

<9 N 

< e[n] — e[n] > = < ( x[n] - V a k x[n—k] ) (—x[n—j]) > 

da * k=l 

N 

— —< x[ri] x[n—j] > + ^ At < x[n~ j]x[n~k] > 

k=i 

= 

Putting < x[n] x[n—j] > = R(f), which is the autocorrelation function of the input data sequence, this last 
equation gives 

N 
R(j) - J a k R(j-k) = fory=U,...,JV 
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If the auto correlation function R(j) has been measured, this last set of N simultaneous equations can be 
solved for the N unknowns, a k , giving the coefficients of the optimum predictor. A value for the mean square 
error, £, using the optimum predictor can then be calculated as follows 



r N 

£ = < e 2 [ri\ > opt = < e[n] I x[n] - ^ ak 4«~&] 



N 

> 

k=\ 



d 
As < e[n] x[n— k] > = — < e[n] — e[n]> k = 1,2, . . ,,N 

= for the optimum predictor 

£ = < e[n] x[n— k] > 

N 
= R(0)- J a k R(k) 
k=l 

The mean power, f, of the prediction error signal tends to be less, therefore, than the mean power, R(0), 
of the original signal. 

In the frequency domain, this process of optimisation ensures that the prediction will be accurate at 
frequencies with large component amplitudes; for example, at frequencies around d.c. and around colour-subcarrier 
frequency in composite PAL signals. It can be shown that the statistically optimum predictor tends to make the 
power spectral density of the prediction error the same at all frequencies 15 . (In other words, the spectrum of the 
prediction-error signal tends to be 'white'.) 

However, it is not possible to design a DPCM predictor which is accurate at all signal frequencies. To see 
this, consider the prediction error response, E(<jj), to an input frequency, exp(ja>t). Then for a sampling interval 
of T 

N 
E(a>) = 1 — V a s exp(j<osT) 

5=1 

N N 

i.e. |£(ftj)| 2 = [1 + V a s expQ'cosT}] [1 + V a r exp(— jcorT)] 

s=l r=l 

which gives after some rearrangement 

N N JV-1 N-r 

\E(oj)\ 2 = 1+2 V a s cos(cosT)+ V a s 2 + 2 V V a s a s+r cos((orT) 
s=l j=l r=l s=l 

The average, prediction-error power spectral density up to half sampling frequency is then 

T cv/T N 



f n \E(cj)\ 2 (1(0=1+ V 

J-n/T ,tl 



2jr 



This expression for the average value of | E(a>) \ 2 shows that if the prediction is good (i.e. | E(od) \ is 
small) for a wide range of frequencies, then the prediction must be correspondingly poor (giving a large value for 
| E(at) | 2 ) in the remaining frequency range. This is illustrated in Fig. 13. A good predictor, therefore, matches 
those spectral regions where the prediction is best with regions where there is, statistically, most energy. This 
matching, of the prediction to the power spectrum, is obtained via the auto-correlation function of an 'average' 
television signal as described above. 
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Fig. 13 - Example of a one-dimensional prediction error frequency response, illustrating the fact 
that it is not possible to design a (non-adaptive) predictor that is good at all frequencies. 

3.2 Adaptive prediction 

Many adaptive strategies have been proposed, for example by Graham 16 , Zchunke 17 and Dukhovich and 
O'Neal 18 . These strategies involve adaption between various simple two-dimensional or one-dimensional predictors 
according to decisions, about the expected nature of the picture material, based on previously transmitted 
information. More recently, DPCM prediction strategies have been proposed which include motion compensated 
prediction 19 . Also, DPCM systems which explicitly signal the predictor adaptation information have been tested 20 . 

It is beyond the scope of this Report to review all these different systems. However, preliminary 
comparative studies 21 suggest that there is not a great deal of difference between their performance. For practical 
systems, therefore, other factors such as instrumental complexity and decoder stability in the presence of 
transmission errors assume more importance. 

3.3 Nonlinear quantisation 

A typical DPCM nonlinear quantiser with fifteen output levels is shown in Fig. 14. The impairments 
introduced by nonlinear quantisation fall into three categories known as 'granular noise', 'edge busyness' and 'slope 
overload'. Granular noise appears typically in plain areas of the picture where the prediction error is small and is 
quantised with the more closely spaced, inner levels of the quantiser. Edge busyness results from larger quantising 
distortions added at less predictable edges; this quantising error can vary from line to line and field to field giving 
the appearance of edge-localised noise. The eye is less sensitive to noise localised on edges than in plain areas, 
however, and the spacing of the mid-range quantiser levels can be increased accordingly. Slope overload occurs 
when the prediction error at an edge is greater than the maximum magnitude of the quantiser output and gives rise 
to an apparent softening of such an edge. 

The precise spacings of the output levels of a nonlinear quantiser are not particularly critical. The 
arrangement of levels is a compromise, balancing the requirements to have a quantiser which spans a range which 
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is sufficient to prevent slope overload and yet which does not introduce an excessive level of granular noise or 
edge busyness. 

The noise and distortion introduced by the nonlinear quantiser determine the subjective picture quality. 
However, it is the statistics of the prediction error signal which dictate the overall quantiser parameters. Ultimately 
therefore, it is the predictor design which determines the picture quality at a given bit rate. 

Two different approaches to the design of nonlinear quantisers are described below. 

3.3.1 Minimum mean-square-error (m.m.s.e.) method 

A common approach is to construct a quantiser which minimises, for a given predictor, the mean square 
quantisation error for a given number of output levels. Following this approach, the probability density function, 
p(x), of the prediction error, x, is first measured for a given predictor and a range of picture material. The 
quantiser output levels, y ( , and decision levels, Xi, are then chosen such that the mean square error, a/, is 
minimised where 

N 

Oct = X | (x-y t ) p(x)dx 

1=1 Jxi 

For a fixed number of levels, JV, appropriate values of xt and yi can be found numerically 22 . Approximate 
values of these x, and yt can be determined as follows 23 . Assume p(x) is approximately constant between xi and 
jci+i. Then, 



I 



*' +1 {x~y,) 2 p(x)dx=* — p(y,)6i 3 



where di = x,+i — Xi 

1 N 
i.e. o d 2 = — £ p{yi) 8i 3 

In addition, ^ /> l/3 O0 & = \ P l,i (j>)4y = K 

Here, for a given distribution, K is a constant which is independent of the magnitudes of the individual 6;. 
Setting /?,■ —p l/i (yi)di the problem of minimising a/ is equivalent to that of minimising 



2 a 

subject to the condition that 






V j8, = constant 

j 

Using the Lagrange Method of Undetermined Multipliers, it follows that a/ is a minimum when 

0i = p2 = /?3 = 04 = • • • = Pn = constant 

Therefore, the decision levels, xi, can be determined by dividing the curve of the cube root of the probability 
density function, P 1/3 (x), into equal areas. In this approximate approach the output levels, yt, are chosen to lie 
midway between the decision levels. 

3.3.2 Graphical design method 

Fig. 15 shows the magnitude of the quantisation error, | q e |, plotted as a function of prediction error, p e . 
The points where the quantisation error is zero correspond to the output levels of the quantiser. A subjectively 
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good quantiser can be constructed 24 by arranging that the maximum quantisation error is always less than the line 
given by 

q e = mp e + c for p e ^ T 

The values of m, c and T are easily adjusted to obtain a quantising law with the required number of output 
levels. The value of T must be chosen such that slope overload effects are not visible. The values, of c and m then 
affect the levels of granular noise and edge busyness respectively. 

The graphical design method has been found to give subjectively slightly better quantisers than the m.m.s.e. 
method. This is because the m.m.s.e. method tends to give a wide spacing to the outer levels of the quantiser 
which can give rise to visible distortions on some picture material. 

Much work has been done to include subjective factors and subjective measurements into the design of 
nonlinear quantisers 25 ' 26,27,28 . However, it has been found that given system constraints, such as the number of bits 
available per sample, the two simple design techniques described give a subjective performance near the optimum. 




25 50 

prediction error {p e ) 

Fig. 15 - Example of simple graphical design of nonlinear quantiser. 



3.4 Adaptive quantisation 

Quantisation noise tends to be masked in active picture areas. The picture 'activity' can be estimated, and 
the quantiser adapted accordingly, by altering the spacing of the inner levels of the quantiser such that the resulting 
level of granular noise always remains below the threshold of perception. Adaptation thereby allows the range of 
the quantiser to be increased in active areas. 

Various activity measures are possible. For example, Pirsch 26 used as a measure the maximum magnitude 
of the differences between neighbouring picture elements. The value of the activity measure determines which 
quantiser, from a small selection, is chosen to code the following sample. In order to avoid the need to transmit 
quantiser selection information to the decoder, the activity measure is based only on information which would also 
be available at the decoder. 

The prediction error itself is also a measure of local picture activity and an alternative activity measure 
could be formed by taking a weighted sum of the magnitudes of previously transmitted (quantised) prediction 
errors. Schafer 28 has performed subjective tests of DPCM coding using two-dimensional prediction and adaptive 
quantisation with up to four different quantisers. His results show that there is definitely some advantage, in terms 
of picture quality, of using adaptive nonlinear quantisation. There are disadvantages, however, some of which are: 

1) At the decoder, in the presence of transmission errors, there may be error extension problems caused 
by the incorrect selection of quantiser mode. 

2) The quantiser can adapt too slowly to certain picture detail giving perceptible slope overload effects. 
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3) There are some regular, high-frequency patterns which give a high activity measure but which do not 
mask quantisation noise. For example, if granular noise is added uniformly to test pictures such as 
BBC Test Card F or Test Card G which contain frequency gratings at several frequencies, the noise 
appears to be as visible in the .highest frequency grating as it is in the plain areas of the picture. 
However, in the mid-range frequency gratings the noise is completely masked by the picture detail. 

Within the scope of this Report, it is only possible to mention these difficulties but not to discuss them 
further. 

3.5 Stability of (non-adaptive) DPCM decoders and coders 

DPCM decoders and coders can both exhibit instability. However, the meaning and manifestation of 
instability is not the same for both. 

A stable decoder has an impulse response which decays with time. For an unstable decoder the instability 
is excited by errors in the transmission path between coder and decoder. The decoder loop stability may be 
determined by examining the frequency response, P(a>), of the predictor. The overall frequency response, A (o>), of 
a recursive decoder loop is given by 

A(a>) = 1/(1-^(0))) 

and if the plot of P(a)) (in the complex plane) encloses the point (1,0) then the loop will be unstable and the 
effect of a transmission error can remain or grow. The design procedure described earlier for non-adaptive, linear 
predictors tends to give stable decoders. 

With stable systems, it is not necessary to transmit periodic resetting information to ensure that the coder 
and the decoder remain in step. Any difference between coder and decoder decays away, except, perhaps, for some 
small differences resulting from the practical requirement to truncate the predictor output to a limited accuracy 
before addition to the quantised difference signal (Fig. 12). This truncation adds random truncation noise to the 
prediction values and can be considered as equivalent noise added to the transmitted (quantised) difference signal. 
At low frequencies, where P(<o) is close to unity, this low level noise is amplified by the response, A(u)), of the 
loop which can give rise to visible low frequency noise patterning in the decoder output. The amplitude of this 
patterning can be reduced by increasing the accuracy of the arithmetic around the decoding loop (at both decoder 
and coder) or by incorporating 'truncation error feedback'. In this latter technique, the prediction truncation error 
is added back into the prediction value, for example on the next clock period. This has the effect of putting a null 
at d.c. in the spectrum of the truncation noise, thereby greatly reducing its visibility 29 . 

Coder instability results from an interaction between nonlinear quantiser and predictor and manifests itself 
as a so called 'limit cycle oscillation' which consists of a pattern of quantisation error which is self sustaining, even 
in plain picture areas. Pirsch 30 has analysed in detail this process which may be outlined as follows. Consider, for 
example, a particular limit cycle which has an amplitude gu In a plain area the prediction error could then be as 
much as #i.2 | a/ 1 where a, are the predictor coefficients. If, at prediction errors up to this value, the maximum 
quantisation error is equal to (or greater than) g\ then it is possible for the limit cycle to be self sustaining. In other 
words, limit cycling can occur if at any region of the quantiser 



fqi J \ai\>qi 
i 

/>!/( 2 i fl '-i> 



or 

i 
where/, a function of the prediction error p e , is given by 

. maximum quantisation error at p e . 

A e ) = i i 

\pA 

Very large amplitude limit cycling can sometimes arise when the number of coefficients in the predictor is 
large and the output range of the quantiser is small. In this case, the value of/ increases for large prediction errors 
such that the above inequality is satisfied. 
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It should be noted that the inequality is only a necessary condition for instability to occur. In many cases, 
limit cycles may not appear even though the above condition is satisfied. 

3.6 Results of some non-adaptive DPCM coding experiments 

At BBC Research Department, experimental non-adaptive DPCM equipment was constructed to investigate 
the quality of coding sub-Nyquist sampled PAL television signals for transmission within 34 Mbit/s 4 . Very good 
quality coding was obtained using a 14-element two-field predictor and a 22-level quantiser (4.5 bits/sample). The 
predictor is illustrated in Fig. 16 which gives the weighted contributions taken from sample points from lines in 
two fields. This prediction was derived using average picture statistics from stationary pictures and assuming a 
temporal auto-correlation function with a decay factor of 0.85 between fields. This factor was derived 
experimentally and was chosen to limit the weight given to previous-field information in order to avoid introducing 
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excessive prediction errors on moving material. The system proved very tolerant to transmission errors. For 
example, informal measurements showed that a bit-error rate of 1 in 10 6 (with any error correction turned off) 
gave only a 'perceptible' impairment (Grade 4 on the CCIR 5-grade impairment scale). 

These non-adaptive techniques have also proved effective in the coding of YUV signals, sampled with an 
orthogonal sampling structure according to CCIR Recommendation 601. Predictors which take contributions from 
samples within three fields and which are suitable for the luminance and chrominance components are described in 
Tables 1 and 2 respectively. In these tables, a picture-based coordinate system is used as shown in Fig. 17 (see 
previous page). Using such predictors, good quality coding can be obtained down to 4 bits/sample. At 
3 bits/sample (or 8-level quantisation) granular noise and edge busyness become perceptible. If it is required to 
improve the quality further at this low number of bits/sample, further techniques, such as variable-length coding 
combined with adaptive prediction or quantisation, are required. Variable-length coding is discussed in Section 5. 

Table 1: 15-element luminance predictor. 

Predictor coefficients 

Coordinates Value 

i i k 



1 








1.000 


2 








-0.437 


3 








0.235 





2 





0.327 


1 


2 





-0.195 





-1 





0.201 


1 


-1 





-0.158 





1 





0.172 


1 


1 





-0.172 








1 


0.460 


1 





1 


-0.497 


2 





1 


0.277 


3 





1 


-0.149 





2 


1 


-0.291 


1 


2 


1 


0.198 



Table 2: 


7-element 


chrominance predictor. 




Predictor coefficients 




Coordinates 






Value 


' J 




k 






1 









0.500 


1 2 









0.194 


2 









0.272 







1 




0.655 


1 




1 




0.317 


1 2 




1 




-0.137 


2 




1 




-0.191 



Note that the horizontal scale for the chrominance 
predictor coordinates is twice that of the luminance 
predictor coordinates. 



4. TRANSFORM CODING 

Transform coding is an increasingly popular technique for digital video bit-rate reduction. In this technique, 
a block of samples from the spatial domain is transformed into a second block of data consisting of a set of 
'coefficients'. Each coefficient represents the amplitude of a particular pattern present within that block in the 
spatial domain. The original block of samples in the spatial domain can be recovered by performing the inverse 
transformation on the block of coefficients. 

For typical blocks of picture data, the amplitudes of many of the transform coefficients are small. Thus 
they can be either approximated by zero or quantised using only few bits per coefficient without introducing 
significant distortion when the coefficient block is transformed back to the spatial domain. Fewer bits overall are 
then required to transmit the picture block in the transformed domain, for a given decoded picture quality. 

Many different forms of transformation have been investigated for bit-rate reduction. The best transforms 
are those which tend to concentrate the energy of a picture block into a few coefficients. The Discrete Cosine 
Transform (DCT) is one of the best transforms in this respect 31 . In addition, the DCT and its inverse can be 
carried out in a relatively efficient computational manner 32 . Consequently, at the present time, the DCT is the 
transform most widely used, or proposed for use, in transform-based bit-rate-reduction systems. 

This section concentrates on the DCT in order to illustrate the sorts of processes that are used for bit-rate 
reduction, and describes some of the different bit-rate-reduction methods that can be used on the transformed data. 
A derivation of the Discrete Cosine Transform is given next, and this is followed by its extension into two 
dimensions. Further sub-sections then describe some of the bit-rate-reduction methods that can be used on the 
transformed data. 
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4.1 The Discrete Cosine Transform (DCT) 

Consider a signal, /(x), with a Fourier transform, F(co) 

/•oo 

F{p) = \ f(x)txp(-jwx)dx (4.1) 

J — oo 

The inverse transformation recovers /(x) from F(a)) 

1 r°° 
Ax) = —-/ F(a>) exp (ja>x) dot (4.2) 

^ J_oo 

Consider that a portion, g(x) of length X y of the signal is selected as follows: 

g(x) = fix) b(x) (4.3) 

where &(x) =1 < x < X 

= x < and x > X 

In order to obtain a transform containing only cosine terms, a function g r (x) is first constructed by 
reflecting g(x) about x = 0. 

i-e- gr(x) = g(x) x > 

= £(-x) x < (4.4) 

Then, the Fourier transform G r (<u) o(g r (x) is given by 

-+X 



r+x 

Gr{(i>) = I grix) exp (/COX) dX 

= 2 | g{x) cos (cox) dx (4.5) 

Jo 



If the original signal consisted of a sampled signal with N sampling intervals within the length X, then 

JV-l I kX \ 

g(x) = X J (*) fi (*~"~w~ _T ) (4 ' 6 ) 

i,_n \ JV / 



Ar^O 



where t represents the sampling phase. Most commonly for the cosine transform, t = X/2N. Then equations (4.5) 
and (4.6) give 



G^m) = 2 J *(*) cos [ ^- (* + V4 ) J (4.7) 



This last expression gives the value of G r (a>) at all frequencies. In the discrete cosine transformation, <d takes only 
discrete values. The precise formulation of the transform can be considered to result from the following argument. 
Consider a periodic function, r(x), generated by repeated shifting and adding of £>(co); i.e. 

r(x) = X Sr{x-n2X) (4.8) 

n=— oo 

The Fourier transform, R{u>), of r(x) can be obtained from G r (<u) by using the Fourier 'shift' property that the 
transform of g r (x-n 2X) = G r (w) exp(— jto n 2X) . 

oo 

i-e. R(ui) = G r {(o) X exp(-jftjn2J!0 (4.9) 



n~— oo 
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The sum of exponentials in (4.9) has a non-zero value only at frequencies which are multiples of 1/2X At these 
multiples, the value of the sum is infinite. More precisely (see for example Ref. 8) 



^ exp(— }ti>n2X) — — ^ 8(co— naxj) where wo = — 



n=— oo 



i.e. -..-, 

X 



i?(ftj) = y X G '( nt[J o) S(w-hoV) (4.10) 



Thus, the spectrum of the repeating function, r(x), consists of the spectrum, G>(w), (of the boxed and reflected 
waveform gv(x)) sampled at intervals of the repeat frequency, coo. Alternatively, since r(x) defined by (4.8) is a 
periodic function, it can be expanded as a traditional Fourier series: 



r{x) = 2 A(n) exvO'nmX) (4.11) 

n=—co 

1 r+x 
and A(n) = — I g r (x) exp{~jnmx)dx (4.12) 

Using equation (4.6) we obtain 

, N-l r i 

Mn) = -J s(k) cos\ - n(k+W) (4.13) 

N-l p n 

A'(n) = X.A(n)= J s(Ic) cos I — n(k+Vi) I 



Note that comparing (4.7) and (4.13) gives 

An) = ^: G r (no>o) (4.14) 

The inverse of equation (4.13) is obtained by multiplying both sides by cos[n(m+ l A)/NTr], by summing over n, 
and by using the identity 

N-l 

I 7T I I 7T 

= k¥^m 



V c(n) cos — n(k+Vi) cos — n(m-\- l h) 
« = VN J L JV 



2 



fc=m (4.15) 



where c{n) = ] /i « = 

= 1 « ^ 



This gives 



2 JV ~ 1 T7T 1 

J(«) =T7 2 c(«)^'(«)cos -^ «(«+%) I m = Q,...N-l (4.16) 



JV-1 
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This last expression shows how the set of N sample values in the spatial domain can equally be represented 
by a sum of cosine terms of amplitude (2/iV) c(n) A'(n) where the A'(n) are given by equation (4.13). Each 
cosine term is termed a DCT 'basis function'. The DCT 'coefficients' A'(n) are usually normalised such that the 
transform and its inverse appear more symmetrical in the following manner 



where 



S(n) = 
C(n) = 




- C(n)A'(n) 



n - 
n # 



Then 



and 



fl JV " 1 T 1 

S(n) =7^ C{n) 2 ^")c°s[^ «(«+&)] 

fl N ~ X f 1 

s(m) = J^ £ C{n) S{n) cos ^ n{m+¥l) j 



(4.17) 



(4.18) 



With this normalisation, equal amplitude coefficients S(n) correspond to equal power basis functions. 

It is instructive to examine how the DCT, with only cosine basis functions, carries phase information in 
comparison with the discrete Fourier transform which has both cosine and sine basis functions. In the discrete 
Fourier transform, one can consider that a length, X, of signal, g(x), is repeated at intervals of length X. This 
periodic function can then be represented by a sum of cosine and sine terms with frequencies co„ = nln/X 
(compared with nlir/lX for the DCT). 

Considering then the example of a single Fourier component with phase <$ 

f(x) = cos (con x + <j>) 

giving F(w) — tt\ 6(o>~ftj„)e^ + 6(a>+ft)„)e-^ 



Blocking the signal, as in equation (4.3), gives for g{x) and its Fourier transform G(co) 

g(x)=f(x).b(x) 
and G(a>) - F(a>) * B(a>) 



= 7r[6(w-w„) e?+ + 8(a>+a>„) e" ; '* ] 
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As the Fourier transform of g(—x) is G*(cu), the spectrum G>(w) of the boxed and reflected waveform, as 
required for the cosine transform (and as shown in Fig. 18(a)), is given by 



G r ((o) 



2wX 



sin 



[(«-«■) y] 



(<W— (tin) 



cos (cu— con) — —<t> 



+ 



(<ti+Ot)„) — 
((O+0J„) — 



cos 



[X 
(a)+(on) — +<£ 



The components of G r (<o) for w> are illustrated in Fig. 18(b) for the two cases where <j> = and 

4> = —7t/2 respectively. 

When the periodic function r{x) is generated by repeated shifting and adding, as given by equation (4.8), 
the above spectrum is 'combed' or picked out only at frequencies given by nax> (with coo = 2tt/2X as in equation 
(4.14)) to give values proportional to the DCT coefficients. With = 0, the function r(x) is continuous at x = 
and x = X and there is only one non-zero term in the combed spectrum (Fig. 18(c)). With <f> = —tt/2 the 
function r(x) has a sharp discontinuity at x = and x = X giving several non-zero coefficients as shown in 
Fig. 18(d). These two cases correspond to cosine and sine basis functions of the discrete Fourier transform. 
Therefore we can say that each Fourier cosine term is carried by a single even coefficient in the DCT and that 
each Fourier sine term is carried by a sequence of odd numbered coefficients as shown in Fig. 18(d). 
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(a) Blocked and reflected waveform for f(x) = cos(a>„x + <f>). 
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Fig. 18 - Illustration of Discrete Cosine Transform (DCT) for a single frequency component 
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4.2 Two-dimensional (2-D) transform coding 

The one-dimensional discrete cosine transform equations (4.16) and (4.17) can be extended 
straightforwardly to two dimensions. Consider a 2-D block of sampled picture s(j,k). Performing the discrete 
cosine transform separately for each row of the block gives a set of intermediate coefficients for each row. Each 
coefficient, corresponding to a given horizontal basis function, varies as a function of its vertical position or row 
number. A second discrete cosine transformation can then be performed on each column of the block of 
intermediate coefficients. For a square block of N by N samples the resulting 2-D transform is given by: 



2 JV-i N-i r 1 r 

S(u,v) = -C{u)C{v) £ ^ *tf*)cos[-^H(/+!4)Jcos[-^ v(k+W) 



(4.19) 



and 



JV-1 JV-1 



«=0 v=0 



where 



*(M) = - J J C(u) C(v) S(u,v) cos [ ~ «(/+%) ] cos [ -^ v(k+Vt) ] (4.20) 

C(«) = -t^- for u = 



= i 



for « # 



This 2-D transformation corresponds to the 2-D Fourier transform of a block of 2jV by 2JV samples formed 
by reflecting the original sample block s(j,tc) about two axes as shown in Fig. 19. The two dimensional DCT basis 
functions are given by the product of a horizontally-varying cosine function and a vertically-varying cosine function 
sampled at points j+Vi and k+¥t. i.e. 



S«, r (j>k) = — C(u) C(v) cos [-^ «(/'+**) 1 cos [ ~ v(k+Vi) 
All such basis functions (for all combinations of u and v) defined by (4.21) have equal power. 
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(a) NX N block of picture samples with s(0,0) marked by a cross. (b) 2NX 2N sample block formed by reflections of original block. 

Fig. 19 - Illustration of 2-D blocking for DCT. 

The one- and two-dimensional DCTs are examples of 'orthogonal transformations' in which each individual 
basis function has zero component values (or zero coefficient values) for all the other basis functions. This results 
from the fact that 
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4.3 Bit-rate reduction in the transform domain 

A block of N by N image samples in the spatial domain is transformed into a block of N by N coefficients 
in the transform domain. Each coefficient represents the amplitude of a given basis function within the image data. 
Before transmission, each coefficient is quantised and coded in as few bits as possible in order to minimise the 
average bit rate per block. Excess quantisation error in any coefficient can give rise to the visibility of the 
corresponding basis function within the decoded image block. In particular, comparatively small quantisation errors 
in the d.c. coefficient {u = 0, v = 0) can came the block structure to become visible in the decoded picture. 

For virtually all picture material the coefficients corresponding to the higher horizontal and vertical 
frequencies are consistently smaller and cover a smaller range than the lower frequency coefficients. Assuming that 
the coefficients require similar quantising accuracies, then the number of levels required to code each coefficient 
will be proportional to the range covered by the given coefficient. Assuming further, that the majority of 
coefficients have a similar form of amplitude probability distribution function, then the coding range required for 
each coefficient will be proportional to the standard deviation of the amplitude distribution for that coefficient. 
Therefore, we can assign a number of bits, n{u,v) to a coefficient (u,v) according to its standard deviation o(u,v) 



i.e. 



n(u,v) = log2[a(«,v)] + const. 



(4.23) 



For example, the standard deviations of the distributions of DCT coefficient amplitudes were measured for 
the luminance components of several standard high-quality source pictures. The block size was 8 by 8 and the 
samples within each block were taken only from the lines of one field. The values obtained for the number of bits 
per coefficient according to equation (4.23) are given in Fig. 20(a). It can be seen that the number of bits per 
coefficient decreases with horizontal and vertical frequency. A practical bit assignment map that has been used in 
high-quality picture coding is given in Fig. 20(b). 

Having assigned a given number of bits per coefficient, the average noise power in the decoded picture can 
be reduced by using a statistically optimum (minimum-mean-square) nonlinear quantiser for each coefficient. 
However, care must be taken when using nonlinear quantisers since they can produce occasional large quantising 
errors which are easily perceptible on the decoded picture. (Note that the bit assignment map of Fig. 20(b) is for 
8 by 8 field-based blocks. If, alternatively, the block had included samples from 8 adjacent picture lines then any 
movement of the image between fields would give rise to larger amplitude high-frequency components, in 
particular components with high vertical frequency components. Therefore, for picture-based blocks it is less 
straightforward to assign a bit map which allows significant bit-rate reduction and which takes into account a wide 
range of possible image movement.) 
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(a) Logi (standard deviation) for coefficient amplitudes. 





1 


2 


3 


4 


5 


6 


7 


8 


1 


9 


6 


5 


5 


4 


4 


3 


2 


2 


6 


5 


4 


4 


3 


3 


2 


1 


3 


6 


5 


4 


4 


3 


3 


2 


1 


4 


5 


4 


4 


4 


3 


3 


2 


1 


5 


5 


4 


4 


3 


3 


2 


2 


1 


6 


4 


4 


3 


3 


3 


2 


2 





7 


4 


3 


3 


3 


3 


2 


2 





8 


4 


3 


3 


3 


3 


2 


2 






Fig. 20 - Bit assignment maps for intrafield DCT 



(b) Practical bit assignment map for an 8X8 intrafield DCT. 
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The average bit rate per block for the bit map of Fig. 20(b) is approximately 3.2 bits per coefficient. To 
reduce this average figure it is necessary to discard a significant proportion of the coefficients. Picture quality can 
only be maintained if this is done adaptively. For example, an average of 3/4 of the coefficients of an 8 by 8 
block are zero after quantisation (for high-quality coding). These zero coefficients need not each be coded with the 
number of bits allocated by the bit assignment map. Many different methods have been proposed for signalling the 
positions of the zero coefficients. Some techniques signal 'zones' of zero coefficients and others run-length code 
strings of zero coefficients as the block is scanned for transmission. In order to maximise the length of the strings of 
zeros, the block should be scanned in an order related to the standard deviations (or bit assignment) of the 
coefficient amplitudes. Typically, blocks are scanned diagonally. 

This adaptivity gives a variable bit rate per block. Before transmission, therefore, the variable-rate data is 
written into a buffer store and read from the store at a regular rate. Typically the buffer store would have a 
capacity of at least one field at the transmission data rate. In order to prevent the buffer store overflowing or 
emptying, a signal related to the buffer occupancy is fed-back periodically to control the level spacing of the 
coefficient quantisers. Increasing or decreasing the spacing of the quantiser levels increases or decreases the number 
of zero coefficients respectively which changes the average bit rate per block accordingly. Also, with variable bit-rate 
systems, extra signalling must be included to aid the recovery of the decoder after a transmission error. For example, 
some form of start-of-block code is necessary in order to prevent error extension from one block to the next. 

An optimum system giving a variable bit rate would use variable-length or entropy coding (see Section 5) 
to code the coefficients. Ideally, a separate code for each coefficient would be used, optimised for the amplitude 
distribution function of that coefficient. Also, the quantising laws should be tailored such that the high-frequency 
coefficients are quantised more coarsely to take advantage of the eye's reduced sensitivity to high-frequency noise. 
Using such techniques, for example, the luminance component can be coded with almost imperceptible degradation 
at between 2.0 and 2.5 bits per sample. 

The examples given in this sub-section have concentrated on an 8 by 8 block size. Larger block sizes can 
give some improvement in performance but this is marginal considering the extra computational complexity 
involved. 

4.4 Hybrid Transform/DPCM Coding 

A technique that has been used for very low bit-rate coding and is now being applied at higher bit rates is 
interframe 'hybrid' Transform/DPCM coding (also known as Transform/Predictive coding). Two different forms 
of the hybrid technique have been widely investigated. In the first approach, interframe DPCM coding is applied 
to the coefficients of a 2-D intraframe transform 33 . In the second approach, 2-D transform coding is applied to an 
interframe DPCM difference signal. Block diagrams for the two different sorts of coder are shown in Figs. 21 and 
22. One advantage of the latter technique is that corrections for motion between frames can more easily be applied 
in the spatial domain. Very low bit-rate coding has been reported for these techniques for low-quality image 
applications such as teleconferencing and videophone. 
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Fig. 21 - Block diagram of a basic hybrid Transform/ Predictive coder/decoder. 
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Fig. 22 - Block diagram showing principles of hybrid Transform/Predictive coding. 



5. VARIABLE-LENGTH OR ENTROPY CODING 

Variable-length coding is a bit-rate-reduction technique which can be used either alone or to augment the 
bit-rate reduction given by other techniques such as DPCM or transform coding. The technique exploits statistical 
redundancy in the signal or symbols to be transmitted where these symbols do not all occur with equal probability. 
The bit rate saving that can be achieved depends on the symbol probability distribution. In many cases the 
technique can give a significant reduction in bit rate. 

For example, suppose a DPCM coder produces a set of N levels, [Ai\, after nonlinear quantisation. Some 
levels will occur more frequently than others. A variable-length coder assigns short code words, for transmission, to 
the most probable DPCM levels and longer codewords to the levels which occur less often. If L, is the length of 
the codeword assigned to the level At, then the average bit rate, R, of the data signal for transmission is given by 



R = 2 p' Lf 

where />, is the probability of occurrence of the level A ,■ 



N 

2 



(5.1) 



The following sub-sections first describe how typical variable-length codes are constructed. A brief 
theoretical framework for variable-length coding is then given and some applications, in particular to DPCM 
coding, are described. Finally, some of the features and problems of a practical implementation of a variable-length 
coder and decoder are discussed. 

5.1 Construction of typical variable-length codes 

The variable-length codewords must be constructed such that the decoder, starting from the beginning of a 
new codeword, can detect when the end of that word is reached. This is ensured by arranging that each codeword 
is not equal to the first part, or prefix, of any longer codeword. Several procedures have been proposed for 
designing variable length codes, the most popular of which was proposed by Huffman 34 . This is described here, 
briefly, by reference to the example given in Fig. 23. The eight symbols (or levels) to be transmitted occur with a 
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Fig. 23 - Table showing the derivation of a Huffman code for an 8-symbol code, 

set of probabilities, />,-, given in the first column of Fig. 23. Step 1 involves adding together the two smallest 
probabilities to form a larger probability, which is inserted into the list according to its size. The process is repeated 
until the set has only one member, whose value is 1 (since 2 />, = 1). The length of the codeword, for the symbol 
whose probability is x, is given by the number of times when that probability is involved in a summation. By 
convention, if the probability is uppermost in the pair being added together, a zero is assigned to that step. The 
code for the event with probability x is obtained by tracing out the path followed by x, working back from 
probability 1 to x, using the rules given above, giving the codewords in the final column. Note that the codes are 
uniquely decipherable. No codeword is a prefix of any other codeword. 

The set of variable length codes can be represented pictorially by a tree diagram, as shown in Fig. 24. In 
this diagram the 'root' of the tree represents the start of every codeword and a terminal node represents the end of 




CMOO 



0-101 



(T-29) 



Fig. 24 - Tree diagram for the 8-symbol variable-length code example of Fig. 23. 
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a codeword. Each codeword is represented by a path along the branches of the tree, an upward branch 
representing a and a downward branch a 1. The probability of travelling along a particular branch is indicated in 
the figure. Intermediate nodes represent incomplete codewords. It can be clearly seen from this type of diagram 
that many variations in the arrangement of codewords are possible. 



5.2 Entropy 

The entropy of a signal can be used as a guide to the bit-rate savings that can be achieved through the use 
of variable-length coding. The entropy of a signal is equivalent to the average amount of 'information' in the 
signal. The information content carried by a symbol or codeword, i, which occurs with probability/), is defined by 

information content of ith codeword = — \0g2Pi 

i.e. more information is carried by a codeword which is unexpected and occurs rarely than by a codeword which 
occurs more frequently. The average information content per codeword from a source is the entropy, H, of the 
signal and in terms of bits per symbol, H is given by 



H = ~ X P^ 2 ^ 



(5.2) 



Shannon 35 showed that the above entropy figure equals the minimum number of bits per symbol into 
which the source can be coded assuming that the samples are independent. A variable-length Huffman code gives 
an average codeword length which approaches fairly closely this entropy figure. Therefore, the entropy of a signal 
gives a good, although slightly optimistic, guide to any gains that can be achieved through the use of variable- 
length coding. 

Consider, as an example, the application of variable-length coding to the quantised difference signals 
produced by a DPCM coder as shown in Fig. 25. If q> is the probability of occurrence of the /th quantiser output, 
Qi, then the entropy, H , of the quantised difference signal is given by 



Ho = - X 9' lo 82(?') 



Each probability, #,, can be given in terms of the probabilities, pj, of the prediction-error signal at the input to the 
quantiser 
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Fig. 25 - DPCM coder followed by variable length coding. 
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where the summation is over all input levels that are quantised to the /th output level. Assuming that pj varies 
relatively slowly and that 8, is the number of input levels within the ith partition, then 

qi = dipt 

where pi now represents an average pj for the z'th partition. 

Thus, H =- ^piSilo&npiSi) (5.3) 

i 

If the quantiser, Q, is linear then 8 = S,- for all /', and equation (5.3) gives 

H„ =-d ^pilog 2 (pi) -lo g2 (S) 
j 

As 8 ^ Pi\°g2(pd = X PJ ] °SiPj) 

i output levels j input levels 

= entropy of prediction error signal H s 

H - H s -log 2 8 . (5.4) 

Equation (5.4) shows how the entropy of the (unquantised) prediction error signal, H s , is changed by linear 
quantisation. For example, increasing the spacing of the output levels by a factor of two reduces the entropy by 
1 bit per sample. 

For a nonlinear quantiser with many levels, it is possible to describe the quantiser approximately as 
consisting of different regions with uniform quantisation within each region. If the separation of output levels in 
region Ri is Si, and in region R 2 is 8 2 , etc, then it is fairly straightforward to show that the entropy of the 
quantised difference signal H is given by 

H = H s - /ilog 2 8, - / 2 log 2 S2 -....- /*log 2 5* (5.5) 

where I k = ^ p } 

j'E region k 

For typical predictors and prediction error statistics, the particular h corresponding to the region around 
zero prediction error has the largest value. This tends to mean that the entropy of the transmitted signal is 
dependent mainly on the source entropy and the spacing of the inner levels of the nonlinear quantiser. 

With a DPCM system which includes entropy coding as in Fig. 25, it is required to minimise the noise in 
the decoded signal for a given average transmitted bit rate. If the mean square error is used to measure the noise, it 
has been found that the quantiser which is optimum and which minimises the noise is linear 36 . A nonlinear 
quantiser formed, for example, by removing some of the levels of this linear quantiser would give a smaller 
entropy but at the expense of an increase in average noise level. Most commonly, the spacings of a DPCM 
quantiser are designed to be as coarse as possible, whilst keeping the impairments of granular noise, edge busyness 
and slope overload below the threshold of perception. In these circumstances, if N levels are required in the 
nonlinear quantiser, variable-length coding can give, typically, a saving of between 0.7 and 1.0 bits/sample on the 
log2iV bits/sample required in the absence of variable-length coding. 

In the above example, the application of variable-length coding to the quantised DPCM difference signal 
has been described. Variable-length coding could equally well be applied to the transmission of coefficients in a 
transform-based bit-rate-reduction system. In order to obtain optimum performance from a transform-based system, 
each quantised coefficient would have a variable-length code optimised for the amplitude distribution of that 

coefficient. 

5.3 Practical considerations 

In most transmission systems it is necessary to have a constant bit rate entering the transmission channel. 
Therefore, with variable-length coding, a buffer is required to smooth out the irregular data rate occurring per 

(T-29) - 29 - 



sample. Data is written into the buffer at a variable rate and read from the buffer at a constant rate for 
transmission. The average input and ouput rates must of course be equal. 

In order to prevent the buffer overflowing or emptying on particular picture material, a signal describing 
the buffer occupancy is used to control the parameters of the source codes. Typically, the parameter that is varied, 
in DPCM and transform coders, is the spacing of the output levels in the prediction-error or coefficient quantisers. 
Increasing the spacing of the levels decreases the entropy of the signal and vice-versa. It is usual to have several 
quantiser modes in order to try to avoid sharp changes in picture quality when switching between modes. 

In the limit, for very difficult pictures, there must be a 'fall-back' mode with a coarse quantiser which is 
guaranteed not to cause the buffer to overflow. Similarly, there must be a strategy for avoiding buffer underflow 
for plain picture material. 

The size of buffer required depends on several factors. These include 

a) whether the coding scheme is intra- or inter-field, 

b) how often the buffer occupancy signal is fed back to the source coder and 

c) the maximum and minimum source bit rates from the coder. 

For example, intrafield coders only benefit from comparatively small buffers. In order to estimate the buffer 
size required, suppose that a proportion, x, of the field is very active and gives rise to codewords of the maximum 
length possible with a practical code, say s bits/sample. Also, suppose that the remainder of the field contains 
plain, inactive picture material and gives rise to codewords of the minimum length, r bits/sample. If the 
transmission channel removes samples at t bits/(input) sample and assuming that the picture is stable for a few 
fields, then there is no need for the buffer to be longer than the amount by which it can be emptied during the 
inactive picture area (and filled during the active picture area). If N is the number of input samples per field, the 
maximum required buffer capacity, B m , is given by 

B m = N(l -x)(t-r) bits. 

Since (s - t) x = (t - r) (1 - x) 

_ N.(s - t).(t - r) bits 
g m _ 

_ {s-t){t-r) 

— — fields at the transmitted bit rate. 

{s — r)t 

Taking a typical example with r = 1, s = 16, / = 2.5, B m = 0.54 fields at the transmitted bit rate. 

A buffer store is also required at the decoder. In this buffer the data is written into the store at a constant 
rate from the transmission channel and read out from the store at a variable rate by the variable-length-codeword 
decoder. It is easy to show that, for a given point in a picture, the coder buffer occupancy and the decoder buffer 
occupancy are complementary. If B c (r,n) is the coder buffer occupancy at the time when the data corresponding 
to the point r in frame number n is being written into the coder buffer, and Bd(r,n) is the decoder buffer 
occupancy at some later time when the same data is being read from the decoder buffer, then 

d -d 

— B c (r,n) = — B d (r,n) 

i.e. B c {r y n) + B d {r,n) = const. (5.6) 

Fig. 26 gives an example of how the coder and decoder buffer states change with time. The coder read 
address and the decoder write address vary at a constant rate determined by the transmission channel. The coder 
write address and the decoder read address vary at a rate dependent on the picture content and state of the coder. 
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The times, T c and Td, for which data corresponding to a given sample is in the coder and decoder buffers, are 
given by 

_ B c (t,n) _ B d (r,n) 

Tc — and Td = 

t t 

where t is the transmission rate in bits/sample. 

The total delay, T, through the system is a constant given by 



T = T c +T d = y [B c {r,n) + B d (r,n)] 



(5.7) 



In order that the decoder and coder buffers can be synchronised, it is necessary to send periodic 
information to the decoder about the expected decoder buffer occupancy. This is known at the coder from 
equation (5.6). 

In the presence of transmission errors, variable-length coding schemes give rise to error extension. If a 
codeword is received in error it may be interpreted as a codeword of a different length and therefore the next 
codeword will also be misinterpreted and so on, and the decoder could completely lose synchronisation. In general, 
the variable-length decoder will eventually re-synchronise, and correct interpretation of the transmitted bit stream 
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Fig. 26 - Diagram ilustrating buffer states in a variable-length coding system. 
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will be resumed. It is desirable to limit the effect of synchronisation loss as much as possible. The average time for 
re-synchronisation depends on the construction of the particular variable-length code but it is possible to design 
codes with comparatively rapid error recovery properties 37 . 

In practical systems, the decoder buffer is usually made larger than the coder buffer. This is because 
transmission errors cause the decoder buffer occupancy to deviate from what is expected according to equation 
(5.6). A larger decoder buffer allows the decoder buffer occupancy to vary over a wider tolerance before data is 
lost. Loss of data would complicate and inhibit any buffer resetting strategy. In addition to buffer resetting, 
strategies must be included to limit the impairments introduced by the variable-length code error extension. 
Typically, this might involve additional signalling to help locate, after transmission errors, the start of new lines or 
blocks. Error concealment techniques can then be used. 



6. VECTOR QUANTISATION 

The DPCM and transform coding systems described in the previous sections use what is often referred to 
as 'scalar quantisation'. This means that the amplitude of the prediction error at each pixel (or the amplitude of 
each coefficient) is quantised independently of the amplitudes of the prediction errors at neighbouring pixels (or the 
amplitudes of neighbouring coefficients). 

In 'vector quantisation' a block of samples is quantised collectively. For example, consider the relatively 
simple case of a block of two luminance samples. If the two-sample combination is described as shown in Fig. 27, 
where the amplitudes of the two samples are drawn along two perpendicular axes, then any pattern of two samples 
can be represented by a vector in the manner shown. Also, suppose that only one bit per sample was available for 
transmission, giving two bits per block of two samples. Four different states or patterns can then be assigned to the 
block. These four representative states would be chosen according to the expected distribution of input vectors for 
blocks in the image to be coded. The process of coding an image at one bit per sample then consists of comparing 
each two-sample block with each of the four representative states and choosing the state which gives the minimum 
error or distortion. Fig. 28 illustrates this 'block quantisation' on the vector diagram. All input vectors within the 
partitions PI to P4 are represented by the states 1 to 4 respectively. At the decoder, one of only four representative 
states is reproduced, corresponding to the codeword received from the coder. The set of representative vectors is 
referred to as a 'vector quantiser', 'block quantiser' or 'codebook'. 

A block of two samples leads to a two-dimensional representation. Similarly, a ^-sample block leads to a 
representation in which a ^-dimensional vector corresponds to a particular pattern in the t-sample block. 
Following the example of Fig. 27, the mth axis in k dimensions describes the set of patterns in which all the k 
samples are zero (mid-grey) except for the mth sample. The axes are said to be orthogonal because the mth 
coordinate value (or the mth sample amplitude) can be varied without altering the value of the other k - 1 
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Fig. 27 - 2-D representation of a block of two samples. 
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Fig. 28 - 2-D example of block quantisation. (All input blocks 
falling within region PI are quantised as vector 1 etc.). 
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coordinates (or sample values). Other axes could easily be chosen to describe the coordinates of the vector in 
^-dimensions. For example, any 'orthogonal' transformation, such as the DCT, represents a re-orientation (e.g. 
rotation) of the k orthogonal axes. 

The task of a vector quantisation coder involves lengthy computations. First, a set of representative vectors 
must be chosen to fit the block size, the average number of bits/sample and the expected distribution of image 
source vectors. Secondly, the coder must determine which of the representative vectors approximates most closely 
any particular input vector according to some distortion measure. Finally, the codeword of the appropriate 
codebook vector is sent to the decoder. Normally, the codebook of vectors is pre-determined. The task of the 
decoder is comparatively simple, being required only to reconstruct a pattern from the codebook corresponding to 
the codeword received. 

The positioning of the codebook vectors depends on the distribution of input vectors in the images to be 
coded and on the choice of distortion measure. The images, or parts of an image used to develop the vector 
quantiser, are referred to as the 'training sequence'. The following sub-section briefly describes a technique 
described by Linde, Buzo and Gray 38 for designing an optimal vector quantiser. 

6.1 Codebook generation 

For a given size of codebook, an optimum quantiser minimises the average distortion in the quantised 
signal measured over the training sequence. Many distortion measures have been proposed 38 but the most 
commonly used and most tractable is the mean-square-error measure. Then, for a distribution of vectors, x„ within 
a specified partition, P,, the code book vector, y„ which minimises the quantisation distortion for that partition, is 
given by the 'centre of gravity' or 'centroid* of the distribution within P„ 

1 ^ 

i-e- yi = — J x,- (6.1) 

where N p is the number of unquantised vectors within P ( . 

This technique for code book generation is an iterative process, which is illustrated here assuming a mean- 
square-error distortion measure. The process, for an JV-vector quantiser, has the following stages: 

1) First, an initial code book or set of N vectors is assumed. This might, for example, be N points spread 
uniformly throughout a /^-dimensional volume. 

2) The fc-space is then partitioned (as illustrated in Fig. 28) such that any input vector will be quantised 
to its nearest-neighbour codebook vector. 

3) Using the training sequence, the centroid of the vectors falling within each partition is calculated. The 
centroid minimises the distortion for the given partition and is taken as the new codebook vector for 
that partition. After this stage, therefore, there exists a modified codebook. 

4) Steps two and three are then repeated and so on. 

On each iteration of the above process, the partitioning will change and each new partition will contain a 
slightly different selection of input vectors. The process converges to a quantiser giving a minimum average 
distortion for the training sequence. At each stage of the iteration, the average distortion D m is calculated and the 
process is stopped when the fractional change in distortion is small: 

i.e. (D m -i - D m )/D m sS e . 

A variation of the above technique, which is often used to generate A/-level quantisers where M = 2 R , 
R = 0,1,2 ...., is as follows: 

1) The centroid, y, of the training sequence is taken and split into two close vectors: y + 8 and/ — 8. 

2) The training sequence is then partitioned between the two vectors and the previously described 
optimising procedure is followed to derive a two-level optimum quantiser. 
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3) Each of the two vectors is split again and the training set partitioned into four and optimised to give 
an optimum four-level quantiser. 

4) The procedure is repeated until the distortion has been reduced to the required level, or the number of 
codebook vectors has reached the required number. 

Generation of a codebook is obviously a lengthy process, particularly if a long training sequence is used. It 
is also a lengthy task for the coder to assign an input vector to its nearest-neighbour codebook vector. The 
following processes are required for each vector in the codebook: 

1) Subtraction of the codebook vector from the input block vector on a sample-by-sample basis. 

2) Summation of the squares of the sample-differences over the block. 

3) Finding the codebook vector that gives the minimum distortion. 

In order to reduce the number of operations involved, tree-search routines have been proposed. In these 
routines an input vector is first compared with two vectors each representing half the codebook. The appropriate 
half of the codebook is then halved again and the comparison repeated etc. Use of tree-search routines reduces the 
search time, and ends giving a nearby codebook vector, although it does not necessarily end giving the nearest- 
neighbour vector. The performance of the tree-search technique is not as good, therefore, as a full-search routine. 

6.2 Vector quantiser performance 

Because of the computational difficulty in codebook generation, the block sizes, in studies described in the 
literature, have been small, containing typically less than 16 samples. Also, because of the heavy computational 
load involved in quantisation, the number of vectors in typical codebooks has been limited to give average bit rates 
of between 0.5 and 2 bits/sample. At these bit rates, vector quantisation appears to perform only slightly better 
than other techniques such as the DCT 39 . However, the study of vector quantisation is comparatively recent and 
many variations and new results are being reported in the literature. Developments include adaptive, differential, 
and colour vector quantisation; in the latter, the three YUV components are treated together as one vector 40 . 

At the time of writing, vector quantisation is not a practicable technique for real-time high-quality coding 
because of the high computational requirements. However, the development of sophisticated image-processing 
VLSI is progressing rapidly and vector quantisation may prove to be a powerful bit-rate-reduction technique for 
future high-quality systems. 

7. CONCLUSIONS 

In this Report an attempt has been made to explain some of the most commonly used techniques for image 
coding for long-distance transmission, some of which have been successfully used in the BBC. With each of the 
techniques described there are numerous and sophisticated variations, which it has not been possible to describe 
here. The interested reader is referred to the good survey papers 1,2 , which contain extensive bibliographies, as a 
starting point for anyone wishing to study the literature on a particular technique. 
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